home *** CD-ROM | disk | FTP | other *** search
- VOICE DIGITIZATION AND REPRODUCTION ON THE
- IBM PC/XT AND PC/AT BUILT-IN SPEAKER
- --------------------------------------------
-
- Alan D. Jones July 1988
-
-
- The speaker on the PC and its associated driver circuitry is quite
- simple and crude, having been designed primarily for creating single
- square-wave tones of various audio frequencies. This speaker is typically
- driven by a pair of transistors used as current amplifier which is in turn
- driven directly by the output of a TTL gate. This results in only two
- possibilities of voltage across the voice coil: 0 volts and 5 volts. Any
- sound to be reproduced by this system must be reduced to an approximation
- in the form of a stream of constant-amplitude, variable-width rectangular
- pulses.
- Examination of a speech waveform on an oscilloscope display quickly
- tells us that it is not going to be possible to even remotely mimic this
- waveform under the above restrictions. Much of the information contained
- in the waveform is in the form of amplitude variations, and this is the
- one attribute we cannot reproduce. It is initially tempting to try to
- use the technique of the "class D" amplifier to create the waveform, using
- high-speed pulse width modulation and depending on the mechanical
- characteristics of the speaker and those of the human ear to provide the
- missing low-pass filtering. Assuming the sampling rate to be 8 KHz (based
- on the Nyquist criterion) and, to conserve memory, assuming the samples
- to contain only 4 bits of amplitude information (16 levels), we can see
- that data accumulates at a rate of 4k bytes per second, which is certainly
- acceptable. The problem comes when we try to play back the sound. Pulses
- occur at intervals of 125 microseconds, which doesn't seem too bad, but
- since each pulse can have 16 possible widths, it is necessary to time the
- pulses with a resolution of well under 8 microseconds. This is only a
- couple of instruction times on a 4.77 MHz XT, and even on a fast 80386
- it doesn't give the CPU much time between bits to shift bits, read and
- increment a pointer, check the pointer to see if it's done yet, etc., not
- to mention the difficulty of servicing unrelated interrupts.
- The search for simpler (but still usable) and less CPU-intensive
- methods of reproducing speech leads to the question of what information
- in the waveform we can discard without an unacceptable loss of
- intelligibility. My experiments with running speech signals through
- a graphic equalizer revealed that the lower-frequency components, those
- which are most visible to the eye on the oscilloscope, are actually of
- minimal importance in understanding speech. This is also demonstrated by
- the fact that a whisper is just as understandable as normal speech, but
- does not make use of vibrating vocal chords, which are the primary source
- of low-frequency components in the voice.
- The digitizer circuit consists of two stages of voltage amplification with
- some high-pass filtering built into the coupling capacitors, followed by a
- differentiator. The output of the differentiator is fed to a voltage
- comparator, thus producing an output which has approximately the following
- relationship to the input from the microphone: If the derivative of the speech
- waveform if positive, then the output is logic zero; If the derivative of the
- speech waveform is negative, then the output is logic one. The transition
- timing at the output is entirely analog in nature; there is no synchronizing
- clock signal anywhere in the circuit.
- If the output of this circuit is connected directly to a speaker, the
- resulting sound will still be an understandable version of the input.
- Since the output consists of nothing but a digital bit stream, the job
- of the computer becomes that of simply recording and accurately reproducing
- this bit stream.
- The program operates by reprogramming the 8253 time chip to produce
- hardware interrupts at the 16.5 KHz rate. The interrupt service routine then
- manipulates the NAND gate driving the speaker based on bits read from the
- file. The 16.5 Khz rate was chosen by trial-and-error; this is the audible
- "point of diminishing returns", where a further increase in sampling rate
- didn't produce enough of an improvement to warrant the increased memory
- usage.
- This technique is somewhat limited in its usefulness. It necessitates
- the writing of a "badly behaved" program which not only reprograms the timer
- chip but also totally hogs the CPU for the duration of the voice output.
- Nevertheless, it demonstrates a few interesting things about how humans hear
- speech. I first developed this circuit over a year ago as a rebuttal to
- someone who said "it couldn't be done". Not only can it be done, it is
- actually quite simple. Certainly the circuit could be improved, at the
- possible expense of increased complexity. I'm waiting to hear from some of
- you. If anyone has questions, especially about my sloppy code, I check
- for messages on CIS every three or four days.
-
- - Alan
-
- 74030,554